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Abstract. In this paper, we propose an inexact perturbed path- following algorithm in the 
framework of Lagrangian dual decomposition for solving large-scale structured convex optimization 
problems. Unlike the exact versions considered in literature, we allow one to solve the primal problem 
inexactly up to a given accuracy. The inexact perturbed algorithm allows to use both approximate 
Hessian matrices and approximate gradient vectors to compute Newton-type directions for the dual 
problem. The algorithm is divided into two phases. The first phase computes an initial point which 
makes use of inexact perturbed damped Newton-type iterations, while the second one performs the 
path-following algorithm with inexact perturbed full-step Newton-type iterations. We analyze the 
convergence of both phases and estimate the worst-case complexity. As a special case, an exact path- 
following algorithm for Lagrangian relaxation is derived and its worst-case complexity is estimated. 
This variant possesses some differences compared to the previously known methods. Implementation 
details arc discussed and numerical results are reported. 

Key words. Smoothing technique, self-concordant barrier, Lagrangian decomposition, inexact 
perturbed Newton-type method, separable convex optimization, parallel algorithm. 

1. Introduction. Many optimization problems arising in networked systems, 
image processing, data mining, economics, distributed model predictive control and 
multi-stage stochastic optimization can be formulated as a separable convex opti- 
mization problem, see, e.g. [6l [TTJ [13j [T5J [20l [25j [26] . If the optimization problem has 
moderate size or possesses sparsity structure, then it can be solved efficiently by stan- 
dard optimization methods. In many practical situations, we can encounter problems 
which may not be easy to solve by standard optimization algorithms due to the high 
dimensionality or the distributed locations of the data and devices. However, many 
problems can be reformulated as separable convex optimization problems such that 
the subproblems generated from their components can be solved in a closed form or 
more easier than the full problem. 

In this paper, we are interested in the following convex separable optimization 
problem: 

M 

max \i>{x) := ^ 
xG i=l 
(1.1) { s.t. Xi £l„ (i = ,M), 

M 

^ ' AiXi — 6, 

where x = (xj , . . . , x^) T with Xi € R rai is a vector of decision variables, cf>i : R ni — > R 
is concave, Xi is a nonempty, closed convex subset in W li , Ai G R mxni , b € R m 
for all i = 1, ... , M, and n\ + n,2 + ■ ■ ■ + Um = n - The last constraint is usually 
referred to as a coupling linear constraint. Problems of the form (|1.1[) were considered 
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in many research papers, see, e.g. [H [UJ [TSJ [23] . Note that coupling linear inequality 
constraints of the form YliLi BiXi < d can also be formulated into by using slack 
variables, see, e.g. [15] . 

Several methods solve problem by decomposing it into small subproblems 
that can be solved separately by standard optimization techniques. For instance, by 
applying Lagrangian relaxation, the coupling constraint can be brought into the objec- 
tive function and, by the separability, we can decompose the dual function into small 
subproblems [2J. However, using such a Lagrangian relaxation technique generally 
leads to a nonsmooth optimization problem. There are several attempts to overcome 
this difficulty by smoothing the dual function. One can add an augmented Lagrangian 
term or a proximal term to the objective function of the problem. Unfortunately, the 
first approach breaks the separability of the original problem due to the cross terms 
between the components. Therefore, the second approach is more suitable for this 
type of problems. 

Recently, smoothing techniques in convex optimization have attracted increasing 
interest and found many applications |18| . In the framework of the Lagrangian dual 
decomposition, there are two popular approaches. The first approach is rcgulariza- 
tion. By adding a regularization term as a proximal term to the objective function, the 
primal subproblem becomes strongly convex. Consequently, the master dual problem 
is smooth which allows one to apply smoothing optimization techniques [3J [5J Q3] [23J . 
The second approach is using barrier functions, this technique is suitable for prob- 
lems with conic constraints [3[T0l[T2[T5l[22[23[28l[29]. Several methods in this 
direction are based on the fact that, by using a self-concordant log-barrier function, 
the family of the dual functions which depend on a barrier parameter is strongly self- 
concordant in the sense of Nesterov and Nemirovski |16] under certain assumptions. 
Consequently, path-following methods can be used to solve the master dual problem. 
Note that this technique is only applicable to the cases where either the objective 
function is linear, quadratic and self-concordant or the problem is compatible in the 
sense that it possesses a property that makes the smooth objective function of the 
dual self-concordant. Several methods in this direction require a crucial assumption 
that the primal subproblems are solved exactly. In practice, solving exactly the pri- 
mal subproblems to compute the dual function is only conceptual. Any numerical 
optimization method provides an approximate solution and, consequently, the dual 
function is also approximated. This paper studies an inexact perturbed path-following 
method in the framework of Lagrangian decomposition for solving . 

Contribution. The contribution of this paper is fivefold. 

1. By applying smoothing technique via self-concordant barrier functions, we 
provide a local and a global smooth approximation to the dual function and 
estimate the approximation error. 

2. A new inexact perturbed path-following decomposition algorithm is proposed 
for solving The algorithm consists of two phases. Both phases allow 
the primal subproblems to be solved approximately. Moreover, the algorithm 
is highly parallelizable. 

3. The convergence theory is investigated under standard assumptions used in 
any interior point method and the worst-case complexity is estimated. 

4. When the primal problem is assumed to be solved exactly, our method reduces 
to the path-following method for Lagrangian decomposition considered in 
[T2l [131 |2"2l 125] . However, the variants presented in this papers possesses a 
larger neighborhood of the analytic center where convergence is guaranteed. 
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5. The implementation details are discussed and numerical experiments are im- 
plemented to confirm the theoretical development. 

Let us emphasize some difference between the method presented in this paper and 

the previously known methods. 

f . Even though smoothing techniques based on self-concordant barriers are not 
new, in this paper we do not only apply smoothing techniques to the dual 
problem but also provide some properties of the smooth function. The smooth 
approximation of the dual function only requires that the objective function 
is convex (not necessarily smooth). However, the dual function is smooth, 
which allows us to use any smooth optimization technique such as gradient- 
based methods or sequential quadratic programming-based (SQP) methods 
to solve the master problem. 

2. The new algorithm allows us to solve the primal sub-problems inexactly where 
we can control the accuracy up to 8* ~ 0.043286 (sec Section @] for more 
details) such that at the early steps of the path-following algorithm, they can 
be solved very inexactly. This point is significant if the primal subproblems 
require high computational cost. Note that the algorithm developed in this 
paper is different from the one considered in |27j for linear programming, 
where the inexactness of the primal subproblems is defined in a different way. 

3. Based on a recent monograph |17j . we directly analyze the convergence of the 
algorithm. This makes our theory self-contained. Moreover, it also allows us 
to optimally choose the parameters and to trade-off between the convergence 
rate of the master problem and the accuracy of the primal subproblems. 

4. In the exact case, the variant in this paper still has some advantages com- 
pared with the previous ones. Firstly, the radius of the neighborhood of the 
analytic center is (3 — Vo) /2 ~ 0.38197 which is larger than 2 — V3 « 0.26795 
of previous methods. Secondly, since the performance of an interior point 
algorithm crucially depends on the parameters of the algorithm, we analyze 
directly the path-following iteration to select these parameters in an optimal 
way. 

The rest of this paper is organized as follows. In the next section, we briefly 
describe the Lagrangian dual decomposition method applied to separable convex op- 
timization. Section [3] deals with a smoothing technique for the dual function via self- 
concordant barriers and investigates the main properties of the smooth dual function. 
Section |4] presents an inexact perturbed path- following decomposition algorithm. The 
convergence of the algorithm is analyzed and the worst-case complexity is estimated. 
Section [5] considers an exact variant of the algorithm presented in Section 01 Section 
[6] discusses implementation details of the algorithms. Section [7] shows numerical tests 
and a comparison. Concluding remarks are included in the last section. The proofs 
of the technical statements are given in the appendix. 

Notation and Terminology. Throughout the paper, we shall work on the Eu- 
clidean space M. n endowed with an inner product x T y for x,y £ M. n and the Euclidian 
norm ||x|| = V x T x. The notation x = (xi, . . . , Xm) defines a vector in R™ formed 
from M sub-vectors Xj € K ni , i = 1, . . . , M, where n\ + ■ • • + Um = n. 

For a proper, lower semi-continuous convex function /, the notation dom(/) de- 
notes the domain of /, dom(/) is the closure of dom(/) and df(x) denotes the subd- 
iffcrential of / at x. For a concave function / we also denote by df(x) as the "super- 
differential" of / at x, where df(x) := —d{—f(x)} . Let / be twice continuously 
diffcrcntiable and convex on K™. For a given vector u, the local norm of u with respect 
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1 /2 

to / at x, where V 2 /(x) is positive definite, is defined as \\u\\ x := [u T V 2 /(x)w] 

1 /2 

and its dual norm is \\u\\* := max{u T w | ||i>||a; < 1} = [u T V 2 /(a;) _1 u] . Clearly, 
\u T v\ < ||u||^||-y||* . Let F be a standard self-concordant function, W /0 (a;, r) := 
{z e R™ | \\z — x\\ x < r} defines the Dikin ellipsoid of F at x, where \\z — x\\ x = 
[(z-x) T V 2 F(x)(z-x)} 1 / 2 . 

For a given symmetric matrix P in R nx ™, the expression P >z (rcsp. P >- 0) 
means that P is positive semi-definite (resp. positive definite); P >z Q and P < Q 
(resp. P > Q and P ~< Q) mean that P — Q and Q — P are positive semidefinite 
(resp. positive definite), respectively. 

The notation R + and R++ define the set of non-negative and positive numbers, 
respectively. The function uj : R + — s- R is defined by w(i) := t — ln(l + t) and its 
dual : [0, 1] — s- R is defined by w*(t) := — t — ln(l — t). Note that both functions 
are convex, nonnegativc and increasing. For a real number x, \x\ denotes the largest 
integer number which is less than or equal to x. 

2. Lagrangian dual relaxation in convex optimization. A classical tech- 
nique to address coupling constraints in separable convex optimization is based on 
Lagrangian relaxation [2]. We briefly review such a technique in this section. 

Without loss of generality we consider problem (|1.1[) with M = 2. The separable 
convex optimization problem (|1.1[) . with M = 2, can be expressed as: 

max {4>{x) := <f>i(xi) + (f> 2 {x 2 )} 

(2- 1 ) <t>* ■= \ s .t. A lXl +A 2 x 2 =b, 

iel:=lix X 2 . 

Let us define A := [A\, A 2 ] and n := ri\ + n 2 . The linear coupling constraint A\X\ + 
A 2 x 2 = b can be written as Ax = b. The Lagrange function for problem (|2.1j) with 
respect to the coupling constraint A\x± + A 2 x 2 = b is defined as: 

L(x,y) := 4>{x) + y T {Ax- b) = fafa) + <\> 2 {x 2 ) + y T (A lXl + A 2 x 2 - 6), 

where y £ R m is the Lagrange multiplier associated with the coupling constraint. A 
pair (xq,?/q) € X x R m is called a saddle point of L if 

L(x,yZ) < L(x* 0l y* a ) < L(x* ,y), Vx g A, Vzj € R m . 

The dual problem of (j2~Tj) is 



(2.2) d* := mm d (y), 
where do is the dual function which is defined as 

(2.3) d (y) := max{^i(xi) + (j) 2 (x 2 ) + y T (A 1 x 1 + A 2 x 2 - b)} . 

If strong duality holds at {xq^q) with Xq := (xq 1,^02) ^ X and y^ € R m , then we 
have g]: 

rfp = d (Vo) = min d (y) = max{^(x) | Ax = b} = <P(xq) = cj>* . 

y£R m x£X 

Let us denote by X* the solution set of (12. lj) and by y* the solution set of the 
dual problem (|2.2j) . It is well-known that if either the Slater condition holds, i.e. 
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ri(JT) n{i£ R" | Ax = b} ^ 0, where ri(A) is the relative interior of the convex set 
X, or X is polyhedral, then Y* is bounded [3]. 

Finally, it is important to notice that the dual function c?o(') can be computed 
separately by 

do(y) = d 0t x(y) + d oa (y) - b T y, 

(2.4) where 

do, i(y) ■= max {<j>i(xi) + y T AiXi) , i = 1,2. 

Let Xq be a solution of the maximization problem in (|2.3[) (i = 1, 2), and x5(y) := 
( x o 1(2/) i x o 2(2/))- Lagrangian relaxation generally leads to a nonsmooth optimization 
problem in the dual form. Consequently, numerical solution to the dual problem 
encounters many drawbacks. 

3. Smoothing technique via self-concordant barriers. Let us assume that 
the feasible set Xi is convex, has nonempty interiors and possesses a i^-sclf-concordant 
barrier Fi for i = 1, 2. Theory of self-concordant functions and self-concordant barriers 
can be found in [HI UHl [XZ] ■ Throughout the paper, we use the following assumptions. 

Assumption A.l. 

(a) The solution set X* of (|2.ip is nonempty. Either the Slater condition for 
(|2.1[) is satisfied or X is polyhedral. 

(b) The feasible set Xi is bounded in W u with int(Xj) 7^ and possesses a 
self- concordant barrier Fi with parameter i/j for i = 1,2. 

(c) The function (pi is proper, upper semicontinuous and concave on Xi for 
i = 1,2. 

(d) The matrix A is full-row rank. 

Note that Assumptions AfTJa) and AfTJc) are standard in convex optimization, 
which guarantee the solvability of the problem and strong duality. Assumption AfTJb) 
can be satisfied by assuming that the set of the sample points generated by such an 
optimization algorithm is bounded. Assumption AfTJd) is not restrictive since it can 
be satisfied by applying standard linear algebra techniques to eliminate redundant 
constraints. 

Remark 1. As we can see in Section^ the convex feasible set Xi can be given 
as follows 

X t := X c i n X?, Xi := { Xl G R n * I E iXi = fi} , 

where int(Xf) is nonempty and X? possesses a i/j- self- concordant barrier Fi. Let 

\A 

E = [E\,E<2\ be a matrix formed from Ei and A/E be a reduced form of ^ and 

int(Xj) := mt(Xi) n Xf for i = 1,2. In this case, the theory developed in the next 
sections can be extended to the problem with this constraint, see, e.g. U5f . 
Let us denote by x- the analytic center of Xi, which is defined as: 

xl := argmin FAxi), i = 1,2. 

x i eri(X i ) 

Under Assumption AJTJb), x c := (x^x^) is well-defined due to [THl Corollary 2.3.6]. 
To compute x c , one can apply the algorithms proposed in (TTJ pp. 204-205]. Moreover, 
the following estimates hold: 

(3.1) Fi(xi) - F l (xi) > w(\\xi - x c A\ x c) and \\xi - ir-|| x5 < 14 + 



for all x l £ dom(Fj) and i = 1.2 [Tfl Theorems 4.1.13 and 4.2.6] 
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3.1. A smooth approximation of the dual function. Similarly to (101 [T51 
1221 129] . we construct a smooth approximation of the nonsmooth dual function do 
denned by (|2.3p via self-concordant barriers. 

Let us define the following functions: 

di(y,t) := max {cj>i(xi)+y T AiXi-t[Fi(xi)- Fi{x1)]} , i = 1, 2, 

(3.2) and 

d(y,t) := di{y,t) + d 2 (y,t) - b T y, 

where t > is referred to as a smoothness or barrier parameter. Note that, due to 
the strict concavity of the objective function, the maximization problem in (|3.2[) has 
a unique solution, which is denoted by x*(y,t). Consequently, the functions di(-,t) 
(i = 1, 2) and d(-,t) are well-defined and smooth on R m for any t > 0. As in [29] we 
refer to d as a smooth dual approximation of do an d to the maximization problem in 
(lo.^D as a primal subproblem. 

If we denote by x*(y,t) :— (x*(y,t),x%(y,t)), then we can write 

d(y,t) = <b(x*(y,t))+y T (Ax*(y,t) - b) - t[F(x*(y,t)) - F(x c )}. 

The optimality condition for (|3.2[) is 

(3.3) oedMxUvW + Jgy-tVFiixUvM i = l,2 f 

where d(j)i(x*(y,t)) is the super-differential of </>j at x*(y,t) (i = 1,2). Since problem 
(|3.2[) is convex, this condition is necessary and sufficient. 

Associated with the smooth dual function d(-,t), we consider the following master 
problem: 

(3.4) d*(t):=mmd(y,t). 

We denote by y*(t) a solution of (13.41) if it exists and by x*(t) := x*(y*(t),t). 

For a given (3 e (0, 1), we define a neighbourhood in R m with respect to Fi and 
t > as 

A/f(/3) := {y e M m I \ Fi (x*(y,t)) ■= \\VFi(xt(y,t))\\l n «,t) < p} • 

The following lemma provides a local estimate for do(-), whose proof can be found 
in the appendix. 

Lemma 3.1. Under Assumption A [7] and £ (0,1), the function d(-,t) defined 
by (|3.2j) satisfies: 



(3.5) 0<t 



J>(K(^)-<lk 



< 



do (y) - d(y, t)<tJ2 [w. (Af 4 (x? (y, t))) +Vi] , 



/or a n y eAA^(/3)nAA t F2 (/3). 

From Lemma 13.11 we see that 

< d (y) - d(y,t) < t[2u.(fi) + v x + u 2 ], Vy e Aff 1 {(3) n N[ 2 {J3) . 

Hence, for t = tf > sufficiently small, d(-,tf) is a local approximation to do( - )- 
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Under Assumption AfTJ the dual optimal solution set Y* is bounded. Without 
loss of generality, we can assume that Y is bounded such that Y* C Y. Let 

d c (y) :^<j ) (x c )+y T (Ax c -b), 

where x c is the analytic center of X. From (|2.3[) we have: 

d (y) - d c (y)= max{0(.T) + y T (Ax -b)}- [<p{x c ) + y T (Ax c - b)] > 0, Vu G Y. 

Furthermore, 

< d (y) - d c (y) = max{0(x) - cj>{x c ) + y T A(x - x c )} 

<fi is concave ^ ^ r j-< > ^ 

(3.6) < ma * ( e max Illei + ^ll^lki-^IUi}) 



< J2(u l + 2^-) max ||& + Afy\\* x . \ 

<K 1 +K 2 < +00, My G y, 

where Ki := (i/j + 2-y/z/T) max^69^(i=) + (i = 1,2). The following lemma 

shows that d(-,t) is a global approximation to do(-). The proof can be found in the 
appendix. 

Lemma 3.2. Suppose that Assumption A\T\ is satisfied. Then, for any t > and 
y 6 7, £/ie following estimate holds: 

2 

(3.7) 0<t^uj(\\xt(y,t)-x c i \\ x c)<d (y)-d(y,t) < i[C(^i; ^i,0 + C(A^2, i)], 
i=i 

where £(r; a, 6) := a (l + max (0,ln(^)}) and if i and if 2 are two constants given 
m ([3T6]) . 

The proof of the following statement can also be found in the appendix. 
Lemma 3.3. For a given tolerance Ed > 0, if we choose t > such that 

!/ 2 \ 
|i K V« e V(i-«) ^ J/ . + (jf./ I/i )'«J 

/or /ixed k£ (0,1), i/ien it follows from Lemma \3.2\ that 

d(y,t) < d {y) < d(y,t)+e d . 

In other words, if we fix tf G (0,f) and minimize d{-,tf) over Y , then y*(tf) is an 
Ed-solution of (|2.2[) . 

Since d(-,t) is continuously differentiable, smooth optimization techniques such 
as gradient-based or SQP-based methods can be applied to solve problem p.4j) . If we 
choose tf > sufficiently small, then according to Lemmas 13. II and | 3.2[ we can obtain 
an approximate solution of (|2.2[) with a desired accuracy. 
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3.2. The self-concordance of the smooth dual function. If the function 
— fa is self-concordant on dom(— fa) with parameter M^, then the family of the 
functions fa(-,t) := tF(-) — fa{) is also self-concordant on dom(— fa) n dom(.F'j). 
Consequently, the smooth dual function d(-,t) is self-concordant as stated in the 
following lemma. The proof of this lemma can be found, for instance, in [I2iri5l[22l[29] . 

LEMMA 3.4. Suppose that Assumption A{1\ is satisfied. Suppose further that —fa 
is Mfa- self- concordant. Then, the function di(-,t) defined by (|3.2[) is self-concordant 
with parameter ■= max{M < / )i , ^} for any t > and i = 1,2. Consequently, d(-, t) 

is self- concordant with parameter — maxlM^, M,p 2 , Sj}- 

Similar to standard path- following methods [T|)J HZ] , in the following discussion, 
we assume that fa is linear as stated in Assumption AJ2] below. 

Assumption A. 2. The function fa is linear, i.e. fa(xt) := cfxi for i = 1,2. 
Let c := (ci, C2) be the vector formed from Cj (i = 1,2). Assumption A f2] implies that 
tF — <p is -^j-self-concordant. According to Lemma T3.41 di(-,t) is -^-self-concordant. 
Since fa is linear, if we denote by F(x) := F\(xi) + ^2(2:2) the self-concordant barrier 
of X with the parameter u := u\ + V2, then the optimality condition (|3.9[) is reduced 
to 

(3.9) c + A T y-tVF(x*{y,t)) = 0. 

The following lemma provides an explicit formula for the derivatives of d(-,t). The 
proof can be found in [15j [29] . 

Lemma 3.5. Suppose that Assumptions A{T] and A]2] are satisfied. Then the first 
and second order derivatives of d(-,t) on Y are respectively given as 

(3.10) Vd(y, t) = Ax* (y, t) - b and V 2 d{y, t) = - AV 2 F{x* (y, t))- 1 A T , 

where x*(y,t) = (xl(y,t),x*,(y,t)) is the solution of the primal subproblem in Q3.2[) . 

Note that since A is full-row rank and V 2 F(x* (y, t)) is positive definite, matrix 
V 2 d(y, t) is nonsingular for any y G Y . Moreover, since F{x) and are separable, the 
Hessian matrix W 2 F is block diagonal and they can also be evaluated in parallel, see 
Section [S] for more details about implementation issues. 

Now, since d(-,t) is self-concordant, if we define 

(3.11) d(y,t):=jd(y,t), 

then d(-,t) is standard self-concordant, i.e. Mj = 2, due to jTTJ Corollary 4.1.2]. For 
a given vector v € K m , we define the norm \\v\\ y with respect to d(-,t) as \\v\\ y := 
[v T V 2 d{y,t)v] 1 ' 2 . 

3.3. Recovering the optimality and the feasibility. It remains to show the 
relations between the master problem p.4[) , the dual problem (|2.2p and the original 
primal problem (|2.ip . We first prove the following lemma. 

Lemma 3.6. Let Assumption A\]\ be satisfied. Then: 

a) d(y, •) is non-increasing in R++ for a given y € Y . 

b) d*(-) defined by p. 41) is differentiate and non-increasing in 

c) It holds that d* (t) < dp and lim d* (t) = dg = fa . Moreover, x* (t) is feasible 

for problem (|2.1[) . 
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Proof. Since the function £(x, y, t) := <fi(x)+y T (Ax~b) — t[F(x) — F(x c )] is strictly 
concave and linear on t, it is well-known that d(y, t) — max £(x, y, t) is diffcrcntiable 

xe'mt(X) 

with respect to t and its derivative is given by T]'(t) = — [F(x*(y,t)) — F(x c )] < 
—io{\\x*(y, t) — x c \\ x c) < by (|3.1[) . Thus d(y, •) is nonincreasing in t which proves a). 

Now, we prove b) and c). From the definitions of d*(-), d(y, •) and y*(-) in p. 41) . 
by using strong duality, we have 

d*(t) = d(y*(t),t) = mmd{y,t) 

yeY 

= min max {6{x) + y T (Ax - b) - t[F(x) - F(x c )] } 

yeY ajeint(X) L ' 

(3.12) = max min {<j>(x) + y T (Ax - b) - t[F(x) - F(x c )} } 

xeint(X) yeY 

= max U(x) - t[F(x) - F(x c )} \ Ax = b} 

xeint(X) 

= 0(x*(t))-t[F(x*(i))-F(x c )]. 

It follows from the forth line of ()3.12j) that d*(-) is diffcrcntiable and nonincreasing 
in K++. Moreover, since x c is the analytic center of X, we have F(x*(t)) — F(x c ) > 
u(\\x*(t) - x c \\ x a) due to (J3IIJ). This inequality implies that d*(t) < (f>(x*(t)) < 
4>* = dg- On the other hand, from the forth line of (|3.12[) . we also deduce that 
x*(t) is feasible to (|2.ip . Furthermore, since d* (•) is continuous on we have 

lirn^ + d*(t) = dg which proves c). □ 

Let us define the Newton decrement of d(-,t) as follows: 

(3.13) A = A J( . >t) (2/) := \\Vd(y,t)\\l = [vd(», t)V 2 d(y, t)~^d{y, t)] ^ . 

The following lemma shows the gap between d(-,i) and d*(t). 

Lemma 3.7. Suppose that Assumption A\J]is satisfied. Then, for any y €Y and 
t > such that ^(y) < 1, one has 

(3.14) < tu(\ k . t) {y)) < d(y,t) - d* (t) < tw*(A J( . ft) (y)). 
Consequently, it holds that 

(3.15) d(y,t)-d*=d(y,t)-^ < <w,(A J( . jt) (»)). 

Proof. Since d(-, i) is standard self-concordant, for any y £Y such that A^. ^ (y) < 

1, and y*(i) = argmind(?/, i), by applying |17l Theorem 4.1.13, inequality 4.1.17], we 

yeY 

have 







< u(\j { . t) (y)) < d(y,t) - d(y*(t),t) < u^. ^iy)). 



This inequality is indeed (|3~Ti)) due to ([BTTTjl . To prove (|3.15p . we note that d*(i) — 
dp < by Lemma 13.51 c) , adding this inequality to (|3.14p and noting that dp = <fr* we 
obtain ([3TT5]) . □ 

We can also estimate a lower bound for d*(t) — dp. Since F is convex, by using 
(13.11). we have 



F(x)-F(x c ) < VF(x) T (x-x c ) < \\WF(x)\\* xC \\x-x c \\ x c < (v + 2^)\\WF(x)\\l 
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Since X is bounded and V.F is continuous, using the above inequality, we have 
cjjr := max^gjc || Vi 7 '(a;)||*<: < +00. Thus it follows from the last inequality that 
m&x X £x{F(x) — F(x c )} < (y + 2y / F/)c^ < +00. Moreover, for any functions u, v on 
Z, we have max{u(z) — v ( z )} > maxw(z) — maxu(z). Finally, we estimate d*(t) — c?q 
as 

d*(t) = mind(y,t) = min < max {L(x,y) — t[F(x) — F(x c )}\ 

yeY yeY [xeint(X) 

> min < max {L(x,y)} — t max {F(x) — F(x c )} 

y^Y (_x6int(X) x6int(X) 

> minmaxL(i, y) — t max {F(x) — F(x c )} 
yeY xex xeim(x) 

>d* Q - t(y + 2sJD)c F x . 
Combining this inequality with (|3.14[) we obtain 

d(y, t)-d* >t [w(A J( . )t) (y)) - (1/ + 2^)4 

Now, we define an approximate solution of the dual problem (|2.2p as follows: 
Definition 3.8. For a given tolerance e d > 0, a point y*(t) is said to be an 

e d -solution of $T2$ if < d* a - d*(t) < e d . 

Let y* (t) be an e^-solution of (|2.2[) and y E Y such that A = Aj^ ^ (y) < (3 for a 

fixed /3 S (0,1). We have 

0<d*-d(y,t) < \d(y,t)-d*{t)\ + \d*{t)-d*\ <e d + tu,(X i( . it) (y)) <e d + w.(J3)t. 
Consequently, if we choose t such that t < uj*(f3)~ 1 e d then 

(3.16) < d* - d(y, t) = <j>* - d(y, t) < 2s d . 

The algorithms presented in the next sections aim to find a 2e ( 2-app:roximate solution 
of the dual problem (|2.2[) in the sense of (|3.16[) . Thus d(y,t) is a 2e ( i-appi'Oximation 
of the optimal value <jf . 

It remains to quantify the feasibility gap of the original problem (|2.1[) with respect 
to the coupling equality constraint Ax = b. We define this feasibility gap with respect 
to x*(y,t) as follows: 

(3.17) g feas (y,t)--=\\Ax*( y ,t)-b\\* y . 

Here, x*{y,t) 6 int(X). From (pTP7]) , (|3~TT|) and (|3~T3)) and noting that A < /3, we 
have: 

Wy,*) = HVd(y,t)||; = tA<f0. 
Therefore, with i < aj*(/3)~ 1 £ ( j the feasibility gap reaches: 

Sfcas(y,i) </?w*(/3) _1 £d. 

4. Inexact perturbed path-following method for Lagrangian decom- 
position. This section presents an inexact perturbed path-following algorithm for 
solving approximately 



An Inexact Perturbed Path-Following Method for Lagrangian Decomposition 



11 



4.1. Inexact solution of the primal subproblem. Firstly, we define an in- 
exact solution of (|3.2[) by using local norms. For given y £ Y and t > 0, suppose that 
we allow to solve approximately (|3.2[) up to a given accuracy 6 > 0. More precisely, 
we define this approximate solution as follows: 

Definition 4.1. A vector xg(y,t) is said to be a S -approximate solution ofx*(y,t) 

if 

(4-1) \\xg(y,t)-x*(y,t)\\ x , {y , t} <5. 
Associated with xg(-), we define the following function: 

(4.2) d- s (y,t) := c T xg(y,t) + y T (Axg(y,t) - b) - t[F(x s (y,t)) - F(x c )]. 

This function can be considered as an inexact smooth dual version of do. Next, we 
introduce two quantities: 

(4.3) Vdj(v.*) := Ax s (y,t) - b, and V 2 dg(y,t) := jAV 2 F(xg(y, t))' 1 A T . 

Since x*(y,t) £ dom(F) = mt(X), we can choose an appropriate 5 > such that 
xg(y, t) £ dom(F). Hence, V 2 F(xg(y, t)) is positive definite which means that V 2 dg is 
well-defined. Note that Vdg and V 2 dg are not the gradient vector and Hessian matrix 
of dg(-,t). However, due to Lemma l3~5l and (|4.1j) . we can consider these quantities as 
an approximate gradient vector and Hessian matrix of d(-,t), respectively. 
Let 

(4-4) d- 5 (y,t):=jd- s (y,t), 

and A be the inexact Newton decrement of dg which is defined by 

(4.5) A = A Jj( . |t) ( V ) := ||Vdj(y,t)|||;= [vd s (y, t)W 2 dg(y, i)" 1 Wdg(y, tf ' 

Here, we use the norm ||| • | J| ^ to distinguish it from || • \\ y . 

4.2. The algorithmic framework. From Lemma 13.71 we see that if we can 
generate a sequence {{y k , tk)}k>o such that Xk := Aj/ tfc )(y fc ) < /9 < 1, then 

d(y k ,t k ) td* =4>* and g {eas (y k ,t k ) -> 0, as t. k | 0+. 

The aim of the algorithm is to generate {(y k , tk)}k>o such that A& < < 1. First, 
we fix t = to > and find a point y° £ Y such that A^, to \(y°) < ft- Then we 
simultaneously update y and t such that t tends to zero. The algorithmic framework 
is presented as follows. 

Inexact-Perturbed Path-following algorithmic framework. 

Initialization. Choose an appropriate (3 £ (0, 1) and a tolerance Ed > 0. 

Fix t = t > 0. 

Phase 1: (Determine a starting point y° 6 7 such that Aj, to )(y°) < &)■ 
Choose an initial vector y 0,0 £ Y. Set j = 0. 
For j = 0, 1, . . . perform 

1. If Aj := X^. t )(y° J ) < P then set y° :— y ^ and terminate. 

2. Solve the primal subproblcms (13.21) in parallel to obtain an approx- 
imation of x*(y to)- 
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3. Evaluate Vd s (y°>i ,t ) and V 2 dj(y 0j ', t ) by ([P]) . 

4. Perform an inexact-perturbed damped Newton iteration: y°^ +1 := 
y°j - A,(l + X.r^dsiy^^-^dsiy^^o). 

End For 

Phase 2. Path- following iterations 
Compute a £ (0, 1). Set k := 0. 
For fc = 0, 1,... perform: 

1. If t k < Sd/^J*{P) then terminate. 

2. Update t k +i := (1 - a)t k . 

3. Solve (|3.2p in parallel to obtain an approximation of x* (y k , t k+ i). 

4. Evaluate the quantities Vdg(y h , t k+ i) and V 2 dg(y k , t k+ i). 

5. Perform an inexact-perturbed full-step Newton iteration y k+1 := 
y k -V 2 d- s (y k ,t k+1 )- 1 Vd- 5 (y k ,t k+1 ). 

End For 

Output. A 2£ ( 2-a/pp:roximatc solution y k of (|2.2[) . 
This algorithm is still conceptual. In the following subsections, we shall specify each 
step of this algorithmic framework in detail. 

Let us emphasize an important point. In order to compute d(y, t) we have to 
solve exactly the maximization problem in Q3.2p or equivalently, to solve the system 
of nonlinear equations (|3.9[) . This requirement is impractical. In practice, we can 
only solve this problem up to a desired accuracy 8 > 0. Therefore, the theory of 
the path-following algorithm presented in [TOl HH E3 HH US] for solving (|3.4p may no 
longer be satisfied. Here, we propose an inexact perturbed path-following algorithm 
for solving (|3.4[) . This algorithm allows us to solve inexactly the primal subproblcm 
(|3.2[) . Consequently, inexact-perturbed Newton- type iterations are performed, which 
means that not only inexact gradient but also inexact Hessian of d(-, t) arc used. 

4.3. Computing inexact solution Xg. Note that condition (|4.1I) can not be 

used in practice to compute xg since x* {y, t) is unknown. We show how to compute 
xg such that (|4.1I) holds based on the optimality condition (|3.9[) . 

For sake of notational simplicity, we abbreviate by xg := xg(y,t) and x* := 
x*(y,t). The error of the approximate solution xg to x* is defined as 

(4.6) S(xg,x*) := \\x- s {y,t) - x*(y,t)|| x *(v,t)- 

It follows from the definitions of d(-,t) and dg(-,t), and (|3.9j) that 

d(y,t) - dg(y,t) = [c + A T y](x* - xg) - t[F(x*) - F(xg)} 

= -t[F(x*) + VF(x*) T (xg ~x*)~ F(xg)]. 

Since F is self-concordant, by applying [17l Theorems 4.1.7 and 4.1.8], and the defi- 
nition of 8(xg,x*), the above equality implies 

(4.7) < tLu(5(xg,x*)) < d(y,t) - d,(y,t) < tu m (5(x t ,x')). 

Here, the last inequality holds if S(xg, x*) < 1. 

Next, using again the optimality condition (|3.9|) we have 



S d S := ||c+ A T y - tVF(xg)\\* xC ^ t\\WF(x s ) - WF(x*)\\* xC 

>^\\VF {x t)-VF { x* m ., 



An Inexact Perturbed Path-Following Method for Lagrangian Decomposition 



13 



where the last inequality follows from [171 Corollary 4.2.1]. Combining this inequality 
and jTTJ Theorem 4.1.7], we obtain 

<[VF( X - s )-VF(x*)} T (xs-x*) 



< \\VF(xs)-VF(x')\\l.\\xi-x' 

< - t Hx s ,x ). 



Hence, we get 

(4.8) S(x s ,x*)< 



(v + 2y/U)EI 



provided that t > (v+2^fv)E c p Let us define an accuracy e p for the primal subproblem 
(l3~2l) as 



Then it follows from (gjSJ) that if 

St 



(4.10) EI=\\c + A I y-tVF(xi)\\Z,< 



then Xg(y,t) satisfies (|4.1[) . 

It remains to consider the distance from ds to d$ when t is sufficiently small. 
Suppose that t < aj*(/3)~ 1 £ ( j. Then, by combining (|3.16j) and (|4.7j) we have 

(4.11) \d- 5 (y,t)-r\ = \dg(y,t)-d* Q \<2[l+oj*(p)- 1 w*(5)]e d , 



provided that 5 < 1. 

Remark 2. Since E- s := \\c + A T y-WF(x s )\\* : . > (l-S)\\c + A T y-tVF( 



5)\\x*- 

By the same argument as before, we can show that if Eg < i p , where e p := 

then (|4.ip holds. This rule can be used to terminate the algorithms presented in the 

next sections. 

4.4. Phase 2 - The path-following scheme with inexact-perturbed full- 
step Newton iterations. Now, we analyze Steps 2-5 in Phase 2 of the algorithmic 
framework. In the path-following fashion, we only perform one inexact-perturbed 
full-step Newton (IPFNT) iteration for each value of parameter t. In other words, the 
IPFNT iteration and the update of t are simultaneously carried out. The parameter 
t is decreased by t + := t — At, where At > 0. Hence, one step of the path-following 
method is performed as follows: 



(4.12) 



t+ :=t-At, 

y+ :=y- V 2 d s (y, i + )- 1 Vd s (y, t+). 



Since Newton method is invariant under linear transformations, by ()4.2j) . the second 
line of (|4.12|) is equivalent to 

(4.13) y+ :=y- V 2 d s (y, t^Wd^y, t+). 
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For sake of notational simplicity, wc denote all the functions at (y + , t + ) and (y, t + ) 
by the sub-index " + " and "i" , respectively, and at (y, t) without index in the following 
analysis. More precisely, we denote by 

K : =}d s (-,t+)(y+)> 5 + 8{x s+ ,x* + ) = \\x s (y+,t + ) - x*(y + ,t + )\\ x * {y+tt+) , 

Al := _ A ^(-,t+)(y)= ^ := s (%6i, x i) = \\%5(y,t+) - x*(y,t+)\\x*( y ,t+), 
A : = x d- s (;t)(y)> 6 : = s(x s ,x*) = \\xs(y,t) - x*(y,t)\\ x *( y ,t), 

and by 

A := \\xg(y,t + ) - xj(j/,i)||g ? ( tf)t) and A* := \\x*(y,t + ) - x*(y,t)\\ x .( Vtt y 

Note that the above notation does not cause any confusion since it can be recognized 
from the context. 

4.4.1. The main estimate. Using the above notation, we provide a main es- 
timate which will be used to analyze the convergence of the algorithm presented in 
Subsection 14.4.41 The proof of this result is postponed to Subsection 14.61 

Lemma 4.2. Let y £ Y be given and t > 0. Let (y + ,t + ) be a pair generated by 
(HUH). Suppose that 5 X + 2A + A < 1, 5 + < 1 and £ := ^/^A-x ' Then 

(4.14) A + < {£+ + 5 X + e + Si [(1 - <5i)- 2 + 2(1 - S.r 1 ] £} . 

Moreover, the right-hand side of (|4.14p is nondecreasing with respect to all variables 
S+, Si, A and A. 

In particular, if we set S + = and 5i = 0, i.e. (|3.2[) is solved exactly, then 
A + = A+ , A = A and (|4.14[) collapses to 



(4.15) A+ < 



provided that A + 2 A* < 1. 



A + A* 



1 — 2A* - A 



4.4.2. Finding the maximum centering parameter (3*. The key point of 
the path- following algorithm is to determine the maximum value of (3 € (0, f3*) C (0,1) 
and appropriate values of 6 and A such that if A < f3, then A+ < (3. We analyze the 
estimate (|4.14[) to find these parameters. 

First, let (3 € (0, 1) such that \ < (3. Since the right-hand side of (|4.14[l is 
nondecreasing with respect to all variables, if we define 

<Ps(0 ■= ~ {2S + P + 5[(1 S)- 2 + 2(1 - 5)- 1 ]£) , 

and £ := 1 ^ s A+ p ^ 2A -, then A + < f3 if ip s (£) < f3. This condition leads to < £ < 

^/ p2 + iq - p a nd0 < 6 < where p := 5[(l-5)- 2 + 2(l-5)~ 1 ] and g := (1-5)^-25. 

Now, let 6< := V p2 + 4,? ~ p > 0. Since £ = ^/j^^ < 6*, we have (1 + 20) A < 9(1- 

5 — f3) — (3. Thus, in order to ensure A > 0, we require that 9 = Vp l + 4 *? g •> . 
This condition leads to 

(4.16) P(/3) := c + Cl p + c 2 /3 2 + c 3 /3 3 > 0, 
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where c := -25(1-S) 2 < 0, c x := (l-S)[{l+5) 2 ~p\ > 0, c 2 := p-3-25 2 +25 < and 
c 3 :— 1 — 5 > 0. By well-known characteristics of the cubic polynomial, we know that 
■p(/3) has three real roots if 18coCiC2C3 — 4c2Co+C2C2 — 4c3cf — 27c|c§ > 0. By numerical 
solution, the last condition leads to < 5 < (5 max , where S max ~ 0.0432863855. 

Finally, we summarize the above analysis into the following theorem. 

Theorem 4.3. Let 5 max = 0.0432863855 and < 5 < 5 max . Then V defined by 
(|4.16|) has three nonnegative real roots < /?* < f3* < (3%. Suppose that f3 £ (/?*,/?*) 

9(l-6-/3)-p 



and A 



1+26 



> where 9 



\/p 2 +4q-p 
1 ' 



and p and q are defined as above. 



Then, for < 5+ < 5, < Si < 6 and < A < A, if X < (3 then A+ < /?. 

Proof. Note that the cubic polynomial P(f3) has three real roots if I8C0C1C2C3 



4c 3 cf - 
0.0432863855. 



27c§Cq > 0. Numerically, this condition leads to < 5 < 
Moreover, one can show that three roots /3* < (3* < 1 < (3% 

However, V(l3) > implies 

y/p 2 +iq-p 



of V are nonnegative and V{P) > if f) € 
9(1 - 6 - /3) - > 0, where 9 
< A < A ~ !lkd=§)=l 



Thus, from the definition of ^, we have 



> 0. □ 



1+29 

In order to see the values of /?*, j3* and A varying with respect to the accuracy S, 
we illustrate them in Figure |4~T| where the left-hand side shows the values of /3* (solid) 
and j3* (dash) and the right-hand side shows the value of A varying with respect to 
6 when (3 is chosen by f3 := ^* t 13 (dash) and j3 := K- (solid), respectively. 



0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 




0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 



Fig. 4.1. The values o//3„, /3* and A varying w.r.t S. 



4.4.3. The update rule for the barrier parameter t. It remains to quantify 
the decrement At of the barrier parameter t. From (|3.9|) we have 



c + A T y ~tVF(x*) =0 and c + A T y - t + VF (x{) = 0, 

where x* := x*(y,t) and x\ :~ x*(y,t + ) are defined as before. Subtracting these 
equalities and then using t + = t — At, we have t + [\/F(x\) — WF(x*)] = AtV F(x*). 
Using this relation together with [17l Theorem 4.1.7] and || VF(a;*)||*, < ^/v (see [TTl 
inequality 4.2.4]), we have 



t+\\xt-x*\\ 2 x , 



< t+[VF(xl) - VF(i')] t (i] - x*) = AtWF{x*) T {xt - x*) 



< At\\VF(x*)\\* x ,\\xl - x*\\ x , < At^\\xl - x*\\ x *. 
By the definition of A*, if t > [^Jv + l)At, then the above inequality leads to 

y/vAt 



(4.17) 



A* < A* 



t-(^/v + l)At 
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Note that P~T7j) implies 
(4.18) 



At = 



A*t 



On the other hand, using the definitions of A and 5, we have 

1 



< 



< 



(1-5) 



1 



x - X 



(4.19) 



(1-5) 



Si 



1 



A* 



< 



<5,<5i<(5 
< 



1 



(1-5) 
1 

0^6) 



Si 



1-A* 
' 5 
1-A* 



A* + 
-A* +6 



Now, we need to find a condition such that A < A, where A is given in Theorem 
This condition holds if + A* < (1 - 5) A - 5 due to (|4~T9j) . Since A* < A* due 

to (|4.17l) . we impose a more relaxed condition 



(4.20) 



< A* < - 
~~ ~ 2 



provided 5 < yqr^-- Thus, we can fix A* 



(1 - 5)A - 5+ 1 - ^((1 - S)A - S- 1 



46 



at 



(4.21) 



(1 - 5)A - 6 + 1 - J((l -5)A-5- l) 2 + 45 



The update rule for the barrier parameter t becomes 

A* 



t+ := (l-a)t 



1 



(V^+ 1)A* 



vMA* + l)t 



^}(A* 



A*' 



where a 



A* 



(7^(0,1). 



Finally, we show that the conditions given in Theorem 14. 3[ (|4.20|) and ()4.21j) are 
well-defined. Indeed, let us fix 5 := 0.01. Then we can compute the values of /?* and 

13* as 

& « 0.021371 < /T w 0.356037. 

Therefore, if we choose /3 := ^ w 0.089009 > #, then 

A w 0.089012, and A* 0.067399. 

4.4.4. The algorithm and its convergence. Now, we are at the point to 
present the algorithm and its convergence. Before presenting the algorithm, we need 
to find a stopping criterion for the algorithm. By using Lemma l4~6b ). we have 



(4.22) 



A< (l-sy^x + s), 



provided that 5 < 1 and A < /3 < 1. Consequently, if A < (1 - 5)~ 1 ((3 + 5) then A < (3. 
Let us define i? := (1 — 5)~ 1 (f3 + 5). It follows from Lemma [3.71 that if iw»($) < Ed 
for a given tolerance > 0, then y is a 2e,i-solution of 
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The algorithmic framework presented in Subsection 14.21 is now described in detail 
as follows. 

Algorithm 1. (Path-following algorithm with IPFNT iterations) 

Initialization: Perform the following steps: 

1. Choose 6 € [0, <5 max ] and compute /3* and j3* as the first and second roots of 
V defined by (|4.16p . respectively. 

2. Fix some j3 € (/?*,£*) (e.g. /3 = ±/3*). 

3. Choose an initial value t = to > 0. 

Phase 1. Apply Algorithm [5] presented in Subsection 14.51 to find y° <G Y such that 

A d - 5( , t0 )(y°)</?. 

Phase 2. 

Initialization: Perform the following steps: 

1. Given a tolerance Ed > 0. 

2. Compute A as in Theorem 1431 Then, compute A* by (|4.2ip . 

3. Compute the factor a — 



4. Compute the accuracy factor 7 := (, y +2V77)(i+<5) . 
Iteration: Perform the following loop. 
For k = 0, 1, • • • do 

1. If tk < ;j^9): where d := (1 — 5) _1 (/3 + <5), then terminate. 

2. Compute an accuracy for the primal subproblem E k '■= 

3. Update t k+1 := (1 - a)t k . 

4. Solve approximately ()2.3j) m parallel up to the given tolerance to obtain 

5. Compute \7dg(y k ,tk+i) and \7 2 d$(y k ,tk+i) according to (|4.3|) . 

6. Update y k+1 as := y k - V 2 d s (y k , i^)" 1 Vd- S {y k , t k+1 ). 
End of For. 

The core step of Phase 2 in Algorithm [1] is Step 4, where we need to solve 
two convex optimization problems to compute the gradient vector and the Hes- 
sian matrix of d$(-,tk+i) at Step 5. These quantities require an approximate so- 
lution xg(y k , t k +i), the gradient vector VF(x$(y k , t k +i)) and the Hessian matrix 
V 2 F(x$(y k , t k+ i)), which can also be computed in parallel. Note that Step 4 ac- 
tually requires to solve a system of nonlinear equations (|3.9[) (sec Section |B] for more 
details). The update rule of t at Step 3 can be done in an adaptive way, where we 
can use |j VF(x ( 5)|| | instead of its upper bound ^pv. For example, we can use At := 

Rfffyw* inStCad ° f WhCrC R ' S '■= (1 ~ ^ ~ + \^ F ^s)Wk_ ■ 

The stopping criterion at Step 1 can be replaced by cj*($fc)ifc < Ed, where $k := 
(1 - 5)- 1 [X i _ ( . th) (y k ) + 6] due to LcmmaO 

Let us define Afc+i := ^d s (- t k+1 )(y k+1 ) an< ^ ^ k := ^<2 5 (- t h )(y k )- Then the local 
convergence of Algorithm [T] is stated in the following theorem. 

Theorem 4.4. Let {(y fe ,t/c)} be a sequence generated by Algorithm^ Then the 
number of iterations fc max to obtain a 2£d- solution of (|2.2[) does not exceed 



In 



(t o;,V)) 



(4.23) fc max := — ±- — 4^ + L 

ln(l — a) 
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where a = VZHThw- e (0 ' 1} and $ = ( 1 -~ S )P-~ S ^ (°. !)■ 

Proof. Note that y k is a 2£<2-solution of (|2.2j) if tk < ^fjpj due to Lemma 13.71 
where i9 = (1 — 5)f3 — 8. Since tfc = (1 — cr) fc to due to Step 3, we require (1 — <r) fc < 
t tXV) • Consequently, we obtain (|4.23p . □ 

Remark 3 (The worst-case complexity). Smce (1 — a) = 1 + y^A'+i) 
which implies — ln(l — a) ~ y^A-'+i) 1 ^ follows from Theorem \4-4\ that the complexity 
of Algorithm^ is 0{s/v\n^-). 

Remark 4 (Linear convergence). The rate of convergence of the sequence {tk} 
is linear and the contraction rate is not greater than 1 — a. Note that if t )(y) < /?, 
then it follows from ([3.11)1 that Xd s (-,t)(y) < Pvi- Therefore, the sequence of Newton 
decrement {^d(-,t k )(y k )}k of d also converges linearly to zero with the contraction 
factor less than or equal to \f\ — a . 

Remark 5 (Recovering the feasibility). Since \7dg(y,t) = Ax$(y,t) — b = 
£Vdg(y,t), we have \\\Axg(y,i) — b\\\* = t\\\ Vdj(y, t)\\\ = t\ < t/3. If we define the 
inexact feasibility gap at Xg(y,t) as 

Q ieas {y,t) := \\\Axs(y,t)-b\\\* y , 

then Gf e &s{y,t) < t/3, which shows that Gfeas(y,t) converges linearly to zero with the 
same rate as t. 

Remark 6 (The inexactness in the IPFNT direction (|4.12|> ). Note that 
we can apply an inexact method to solve the linear system ([4.12)1 . Under appropriate 
assumptions of the inexact term, we can still prove the convergence of the algorithm. 
For more detail on inexact Newton methods, one can refer to the reference 



4.5. Phase 1 - Finding a starting point. Phase 1 of the algorithmic frame- 
work aims to find y° £ Y such that Aj„^. t \(y°) < /3. In this subsection, we consider 
an inexact perturbed damped Newton (IPDNT) method for finding such a point y°. 

4.5.1. Inexact perturbed damped Newton iteration. Let us fix t = to > 

and choose an accuracy 5 > 0. We assume that the current iterate y G Y is given, 
and we compute the next iterate y+ by applying the IPDNT iteration to dg(-,to) as 

(4-24) y+ := y - a{y)V 2 d- s (y, toJ^Vdjfo, t ), 

where a := a(y) > is the step size which will be chosen appropriately. Note that 
since ([4.24)1 is invariant under linear transformation, it is equivalent to 

(4-25) y+ := y - a(y)V 2 d s (y, toJ^Vdjfo, t ), 

It follows from ([3.11)1 that d(-,to) is standard self-concordant, by [17l Theorem 4.1.8], 
we have 

(4.26) d(y+,*o) < d(y,t ) + Vd{y,t ) T {y + - y) +u*{\\y+ -y\\ y ), 
provided that \\y + — y\\ y < 1. On the other hand, from (|4.7)i . it implies that 

(4.27) 0<lj(6(x s ,x*)) < d(y,t ) - d s {y,t ) < u,(6(x s ,x*)), 
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which is an approximation between d(-,to) and dg(-,t ). In order to analyze the 
convergence of the IPDNT iteration (|4.24[) we denote by 

h ■= \\x s (y+,to) - x*{y + ,t )\\ x , { y +!to) , 
(4-28) 5 := \\xs(y,t ) - x*(y, t )\\ x *( y ,t ), 

:= ^jj(.,i )(y) = a (y)\h+ - y\U, 

the solution differences of d(-,to) and dg(-,to) and the Newton decrement of dg(-,to), 
respectively. 

4.5.2. Finding the step size a(y). Now, we find an appropriate step size 
a(y) G (0, 1] such that the sequence generated by (|4.25p converges to y°. Let p := 
y+ — y. From (|4.26|) and (|4.27|) . we have 



ds(y+,t ) < d(y + ,t ) < d(y,t )+Vd(y,t ) T (y + -y)+uj^(\\y + -y\\ y ) 



< dg(y,t ) + Vd{y,t ) T {y+ - y) + w»(||y + - y\\ y ) + u>*(5) 

(4.29) = d- s (y,t )+Vdg(y,t ) T p+ [Vd{y,t )-Vd- s (y 7 t )] T p + lj,(\\p\\v) + W *W 

< d s {y, t ) - aA 2 , + || V%, t ) - Vd- S {y, tb)||;||p|| s + w*(\\p\\ v ) + w*(<5) 

< dj(y, t ) - all + S\\p\\ y + MWpWv) + 
Furthermore, from (|4.42[) and the definition of V 2 <i and V 2 (%, we have 

(1 - <5)V 2 J 5 -(y,i ) d V 2 J(y,t ) ^ (1 - ^V 2 dg(y, to). 
This inequality implies that 

(i-^H^Ibll^a-^lbL. 

Combining this inequality, (|4.25p and the definition of Ao in ()4.28[) we get 

a(l-<5)A < <a(l-,5)- 1 A . 

Let us assume that aXo + 5 < 1. By substituting the right-hand side of this inequality 
into (|4.29[) and observing that the right hand side of (|4.29j) is nondecrcasing with 
respect to \\p\\ y , we get 

(4.30) dg(y+,t ) <d- 6 (y,t )-aX 2 + ^i+^ ( ^) + ^(5). 

l — o V 1 — o J 

Now, let us simplify the last terms of (|4.30[) which we denote by T as follows. 

rp T2 , a ^o$ . ( aX \ h 
1 := -aX n H + cj* + cjAO) 

1-5 \1-Sj 
- 2 , a\oS aXo , f , aAo 



- a A 2 + _ - In 1 - - <5 - ln(l - 5) 

1-5 1-5 \ 1-5,' 



(4.31) = -aA 2 , - (aA + 5) - In 1 - (aA + 5) 

= — qAq + (oAq + 5). 
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Suppose that we can choose rj > such that aX^ — cj^aAo + 8) = ui(rf). This 
requirement leads to «Aq = (aAo + 8) a(Xo + Xq) + 8 which is equivalent to 



(4.32) 



(1 - 6)X Q -25 + tJ(1-8) 2 X 2 -4SX 



2A (1 + A ) 



provided that < S < 5 := 2+X ° 2 ^ 1+x ° . Consequently, we deduce 



Ao 



(4.33) 



(l-6)X -26 + J(l-6) 2 X 2 -46X 



(l + S)X + J(l-8) 2 X 2 -4SX 



Note that if 6 = 0, then a = and -q = A . The IPDNT iteration (j4~24|) becomes 



the exact damped Newton iteration as in |17j . 

We assume that Aq > P for a given /3 G (0, 1). Let us fix 8 such that 



(4.34) < 5 < 8* := 

Next, we choose step size a as 



2 + /3-2VTT7? 

p 



2 + /? + 2^1 + 3 



(4.35) 



(1 -*)Ao- 25+ V (l-8) 2 X 2 -A6X 

a{y) := ^TTa^ £ (0 ' 1} ' 



Then the IPDNT iteration (|4.24j) with a(y) given as (I4.35|l generated a new point y+ 
such that 



(4.36) 
where 

(4.37) 



d s (y+,t ) < dg(y,t ) - cv(r)), 



P 



j] := 



28+ J(l-S) 2 8 2 -488 



€ (0,1). 



(l + S)0+yJ(l-S) 2 P 2 - 4(5/3 



Finally, let us estimate the constant r\ for the case /3 ps 0.089009. We first obtain 

8* ps 0.02131. Let 8 = \8* ps 0.010657. Then we get 77 ps 0.0754963. Consequently, 
<%) 0.003002. 

4.5.3. The algorithm and its worst-case complexity. In summary, the al- 
gorithm for finding y° £ Y is presented in detail as follows. 

Algorithm 2. (Finding a starting point y° <G Y) 

Initialization: Perform the following steps: 

1. Input P G (P*,P*) and t > as desired (e.g. /? = J/3* ps 0.089009). 

2. Take an arbitrary point y°- Q G y. 



3. Compute <5* 



2+/3+2vT+a 



== and fixed <5 € (0,<5*) (e.g. 8 = 0.58*). 
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4. Compute an accuracy e r 



t s 



2(iy+2 v ^7)(l+<5) 

Iteration: Perform the following loop. 
For j = 0, 1, • • ■ do 

1. Solve approximately the primal subproblcm (|3.2[) in parallel up to the accu- 
racy e p to obtain 5ig(y ,J ,to). 

2. Compute := Aj_ ( . to) (y°' J ') . 

3. If Aj < /3 then set y° := y ^ and terminate. 

4. Update y 0j+1 as y°^ +1 := y°>i - a^-V^g^'Vo^V^y '^ t ), where £ 
(0, 1] is computed by 



(1 - S)X j -25+ -l) 2 A| -ASXj 



2A j (l + A i ) 

End of For. 



The convergence of this algorithm is stated in the following theorem. 

Theorem 4.5. The number of iterations to terminate Algorithm^ does not exceed 



(4.38) 



d s (y -°,t )^d*(t Q )+ 00,(6) 



1. 



where d* (to) — min<i(?/,io) an d f] is given by (|4.37P ■ 
y£Y — 

Proof. Summing up (|4.36[) from j = to j = k and then using (|4.2T[) we have 

< d(y°' k ,t Q ) - d*(t ) < dg(y > k ,t ) + «.(!) - d*(t ) 
< ds(y°'°,t )+^*(h ~ <f*(*o) - ku(ri). 
This inequality together with p. lip and (|4.4j) imply 

c%(y°>Vo)-d*(i )+u;*(f) 



t Gj(r)) 



Hence, the maximum number of iterations in Algorithm[2]does not exceed J ma x defined 
by (OH)l . □ 

Since d*(to) is not available, the number J max in (|4.38[) only gives an upper 
bound for Algorithmic However, in this algorithm, we do not use Jmax as a stopping 
criterion. 

4.6. The proof of Lemma [4T2l First, we prove the following lemma which will 
be used to prove the main inequality in Lemma 14.21 

Lemma 4.6. Suppose that Assumptions A{T] and A^\are satisfied. Then 
a) V 2 <i and V 2 dj defined by (|3.10[) and (|4.3[) . respectively, guarantee 

(1 - 5 + ) 2 V 2 d(y+,t + ) * V 2 d s (y+,t + ) 5 + )- 2 V 2 d(y + ,t+), 



where 5+ < 1 defined by (|4.6[) . 
b) Moreover, one has 

(4.39) \\Wd- s (y,t)-Wd(y,t)\\* y < \\x s -x*\\ 3 
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c) If A < 1 then 
(4.40) 



Ai < 



A + A 



Proof. Since F is standard self-concordant, for any z € W [x, 1), it follows from 
[T71 Theorem 4.1.6] that 



(4.41) 



(1 - \\z - x|U)^V^F(x) d V 2 F(z) ^ 



(1 



rV 2 F(a;) 



Since S7 2 F(x) is symmetric positive definite, by applying [TJ Proposition 8.6.6] to 
two matrices 

x\\ x ) 2 V 2 F(x) and V 2 F(z) we obtain 



(i_l| z j_ x || p V 2 -F(a:) and V 2 i 7 '(z), and then to two matrices (1 — \\z 



(4.42) 



(1 - \\z - xWxYAV'Fix^A 1 ^ AV 2 F{ 



-l at 



d (1 - \\z - .xiUr^V 2 ^)- 1 ^. 
Using again [T| Proposition 8.6.6] for (|4.42l) we get 



(4.43) 



(1 - \\z - x\\ x ) 2 A r [AV 2 F{x)- L A r ]- L A -< A 1 [AV F(z) A ]"M 



< (1 - ||2 - .x|| :r )- 2 yl T [^V 2 F(a;)- 1 yl T ]- 1 A 

Now, using (j3~TU| and dnHD , we have V 2 d{y,t) = ±AV 2 F{x*)- 1 A T . Alternatively, 
using (|Q)I and (pT3|) . we get \7 2 d s (y,t) = j I A\7 2 F(x § )~ 1 A T . Substituting these 
relations with x = x* + and 2 = Xg + into (|4.42[) and noting that <5 + = 8{x + ,x* + ) 
defined by (|4T5]l. we obtain (|Q5)) . 

Next, we prove b). For any x G dom(_F), the Hessian matrix X7 2 F{x) is symmetric 
positive definite. Let us define 



M(x) :-- 



V 2 F(x) A T 

A AV 2 F{x)- 1 A T 



First, we show that M(x) is positive definite. Indeed, for any z = (u,v) G M" x 
we have 

z T M(x)z= u T V 2 F(x)u + u T A T v + v T Au + v T A\/ 2 F (x)' 1 A T v 



= \\\7 2 F(x) 1/2 u\\ 2 + 2(V 2 F{x 



(V 2 F (x)- 1 ' 2 A T v) + \\V 2 F(x)-^ 2 A 



T v\\ 2 



= \\V 2 F(x) 1 ^ 2 u + W 2 F(x)- 1 / 2 A T v\\ 2 > 0, 



which shows that M{x) >z 0. Now, since A is full-row rank, A\/ 2 F{x) 1 A T is also 
symmetric positive definite. By applying Schur's complement to M(x) p], we obtain 



(4.44) 



A 1 [AV 2 F(x)' 1 A 1 }- J A r< V 2 F( 



To prove (|4~3^)l we note that Vd s (y,t) - Vd(y,t) = A(xg - x*). Thus Vd s (y,t) 
Wd(y,t) = \A(x- s - x*). This implies 



\\Vd- 5 {y,t)-Vd{y,t)\\ 



1 

~ t 2 



(i, - x*) T A T W 2 d(y,t)- 1 A(x- 5 - a:*) 

(aj - x*) T A T [AV 2 F{x*)- 1 A T ]- 1 A{x- s ~ x*) 



< (% - x*) T V 2 F(x*)(x- s - x*) 
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which is indeed (|4.39j) . 

Finally, we prove (|4.40|) . By using the definitions of \7dg(-,t + ) and V 2 c%(-,i + ) in 
(|4.3I) . oidg(-,t+) in (|4.4p . for any feasible point x of (|2.1[) . it follows from the definition 
of Ai in (14.51) and Ax = b that 



(4.45) 



A?= [||Vd y (y,t+)|| 

^ Vd,(y, t+)V 2 i+)- 1 Vdj(j/, t+) 



(aft - i) T A r [AV^fe)"^] ^ A(2 ?1 



Since A = \\xg 1 — xg\\ Ss < 1 by assumption, it implies that Xg 1 S W°(xg, 1). Applying 
the right-hand side of (|4.43[) with x — xg and z = x gl , it implies that 



(4.46) 



A? < (Y^fei " «) T ^ T [A^Fixgr^Y'Aixg, - 4). 



Now, for any symmetric positive semidefinite matrix Q in R" xn and u,v £ R", one 
can easily show that 



(4.47) 



(u + v) T Q(u + v) < s/vFQu + y/v T Qi 



Since Hg := A T \AV 2 F(xg) 1 A T ] 1 A is symmetric positive semidefinite, applying 
(|4.47l) with Q = Hg, u = xg ± — xg and v = xg — x, we have 

(4.48) \\ < ^^{[(x Sl -x s ) T H g (x gl -^^ 

Note that Hg ^ V 2 F{xg) due to (j4T44|) . The first term of the right-hand side of (|4T48|) 
satisfies 

(4.49) [■••]< (x s+ - Xg) T W 2 F(xg)(x- sl - xg) = A 2 . 
On the other hand, by substituting xg 1 by xg into (|4.45p . we get 

(4.50) A 2 = (xg-xfA T [AV 2 F(xg)- 1 A T }- 1 A(xg-x) = (xg-xfHg(xg-x). 
Combining (|P8|) . (|4T49|) and (|430)l . we obtain 

(A + A) 2 



(1-A) 2 ' 



which is equivalent to (|4.40p . □ 

The proof of Lemma \4^M Since Si + 2A + A < 1, it implies that Si < 1, A < 1/2 
and A < 1. The proof of Lemma l4~2l is divided into several steps as follows. 
Step 1. First, we prove the following inequality: 



(4.51) 



A, < 



1 



1 



(i-IMI 



. , (25i~5l).. \\p\\l 
5l + ^W MU + ^My 
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where p := y+ — y. Indeed, it follows f|4.39p that 



(4.52) 



A+ = \\\Vd- s (y + ,t + ) 
< 



» 1/4 



Vdg (y+ , t+ ) V 2 dg (v+ , t+ r'Vds (y+ , t+ ) 

1/2 



1/2 



< 



(!"*+) 

Next, using ()4.39j) we have 



^ Vd s (y + ,t + )V 2 d(y + ,t + )- 1 Vd- 5 (y + ,t + ) 
\\Vd s (y + ,t+)\\* v+ 



(4.53) 



II Vd 5 (y+ , t+) || * < || Vd(y+ , t+) || ^ + || Vdj(y + , f +) - V%+ , f +) 1 1 * 



y+ 



< \\Vd(y + ,t + )\\; + + 6 + . 



Since d(-,i+) is standard self-concordant according to Lemma 13.41 one has 

iivd(!/ + ,* + )ii; + <— ^ 

(4.54) 



1 



||v% + ,t+)||; 



i - bll„ 

Plugging (|434|) and (|433| into (|432]l we obtain 

1 



l|Vd(y + ,t + )|| 



(4.55) 



A+ < 



iiv% + ^+)ii; , , 

— i — in — +6 + 

1 - WpWv 



On the other hand, from (|4. 1 3[) , we have 

Vd(y + ,t+)™ Vd(y+,t+) - [Vd,(y,*+) + V 2 dj( tf , - y) 
Wd(y,t+)-Wds(y,t+) 
{[V 2 d(y,t + ) - W 2 d s (y,t + )](y + - y)} 
+ [Vd(j/+,t+) -Vd(j/,t+) -V 2 d(y, *+)(?/+ -y) . 
By substituting i by 4+ in (|4.39|) . we obtain an estimate of the first term of (|4.56|) as 



(4.56) 



(4.57) 



\\Vd(y,t + )-Vd- 5 (y,t + )\\l < H^-xtlUj =5 X 



Next, we consider the second term of (|4.56|) . It follows from (|4.39p that 

[(1 - Sif - 1] V 2 d{ Vl t+)^ V 2 d- 5 (y,t + ) - V 2 d(y,t+) 

± [(l-^)- 2 -l] V 2 d(y,t + ). 

W%(y,t + ) - W 2 d(y,t + )] and if := V 2 d(y, t + y l l 2 GV 2 d{y, t+y 1 / 2 



(4.58) 



If we define G : 
then 



(4.59) 



\\[V 2 d(y,t)-V 2 d- s (y,t + )}(y + -y)\\; = \\Gp\\l < \\H\\\\p\\ y , 
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where, by virtue of (|4.58|) and the condition Si < 1, one has 



1 A 28 1 -5l 



Hence, (|4.59[) leads to 

(4.60) \\[V 2 d(y,t) - V 2 d- s {y,t + )](y + -y)\\* y < \\p\\v 

Furthermore, since d(-,t) is standard self-concordant, similar to the proof of [TT1 
Theorem 4.1.14], we have 

II oil 2 

(4.61) \\Vd(y + ,t + ) - Vd(y,t+) - V 2 d(y,t + )(y + - y)\\* y < i ^ 

Now, we apply the triangle inequality ||a + fe + c||* < ||a||* + H&HJJ + ||c||* to (|4.56|) and 
then plugging (|4.57|) . ()4.60|) and (|4.61|) into the resulted inequality to obtain 

Finally, by substituting this inequality into (|4.55[) we get (|4.51f) . 
Step 2. Next, we estimate f|4.51[) in terms of Ai to obtain 



(4.62) A+ < 



1 




2 (2<5i-5 2 ) 


r A, A 


(1-*+) 


\l-8 1 -\ 1 ) 


(l-^i) 2 





i — ; — =r+ <5 + 



Indeed, by using (|4.42[) with x = Xg t and z — x* and then (|3.10|) we have 

(1 - ,f 1 ) 2 V a dy(i,,f+) ^ V 2 J(y,t+) d (1 - <5i)- 2 V 2 J 5 -(y,i+). 
This inequality together with the definition of ||| • |j| imply 

(1 - < \\p\\y = [p T \7 2 d(y,t + )p\ l/2 < (1 - ^i)- 1 ! 



Moreover, since |||p||| y = ||| Vdj(y, i+)|||* = Ai due to ()4.13j) . the last inequality is 
equivalent to 

Ai 



(4.63) \\p\\ y < ^ 



Note that the right-hand side of (|4.51l) is nondecreasing w.r.t. \\p\\ y in [0, 1). Substi- 
tuting (|4lKfl) into (|4"3Tj) we finally obtain (gjBg) . 

Step 3. We further estimate (|4.62p in terms of A and A. First, we can easily check 
that the right-hand side of (|4.62[) is nondecreasing with respect to Ai, 8\ and 8 + . 
Now, by using the definitions of A and A, it follows from Lemma T4. 61 c) that 

A < X + A 
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Since 5 + < 1 and Si + 2 A + A < 1, substituting this inequality into (|4.62[) , we obtain 

2 



1 



(4.64) 



(1 



A + A 



1 - Si - 2A - A 



A + A 



(2ji - 5\) 



^(l-^oa-A) 



1 - 5i - 2A - A 



The right-hand side of (|4.64[) is well-defined and also nondecreasing with respect to 
all variables. 

Step 4- Finally, we facilitate the right-hand side of (|4.64l) to obtain (|4.14l) . Since 
A > 0, we have 

(1 - 5i)(l - A) = 1 - A- Sx + SxA = [1 - Si - 2A - A] + (A + A) + S X A 
< [1 - Si - 2A - A] + (Ai + A) + Si(A + A) 
= [1 - Si - 2A - A] + (1 + 5i)(X + A). 



Therefore, this inequality implies 
*i(l-«i)(l-A) 



(4.65) 



1 - Si - 2A - A 



< Sx + 5 1 (1 + Si) 



A + A 



2A- A 



Alternatively, since < Si < 1, we have 1 + Si < . Thus 



2Si - Si) 

(i-sir 



+*i(i+*i)=* 1 

<<5i 



1 



1 



(l-5i) 2 (l-5i 
1 2 



Substituting inequality (|4.65p into (|4.64[) and then using the last inequality and £ := 

!-A A A-A ' WC ° btain ™- 

(Siep 5. The nondecrease of the right-hand side of (|4. 14[) is obvious. The inequality 



(|4.15|) follows directly from (|4.14[) by noting that A = A and 



□ 



5. Path-following decomposition algorithm with exact Newton itera- 
tions. In Algorithm [TJ if we set 5 = 0, then this algorithm collapses to the ones 
considered in jTOl [T5J [HJ [551 US] • However, we emphasize the following points. 

1. We consider this variant as a special case of the algorithm presented in the 
previous sections which is called path-following decomposition algorithm with 
exact Newton iterations. 

2. In [TD1 E21 EH ES] , since the primal subproblem (pOj) is solved exactly, the 
family {d(-, i)}t>o of the smooth dual functions is strongly self-concordant due 
to Legendre transformation. Consequently, the standard theory of interior 
point methods in j!6| can be applied to minimize this function. In contrast 
to those, in this paper we analyze directly the path-following iterations to 
select appropriate parameters for implementation. 

Note that the radius of the neighbourhood of the analytic center in Algorithm[3]below 
is f3* = i(3 — \/5) ~ 0.381966 compared to the one used in literature, (3* = 2 — \/3 « 
0.26795. 
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5.1. Analyzing the exact path-following iteration. Let us assume that the 
primal subproblcm (|3.2[) is solved exactly, i.e. 5 = 0. Then, we have x$ = x* 
and S(xg,x*) = for all y £ Y and t > 0. Moreover, it follows from (|4.17p that 
A = A* = \\x*(y,t + ) — x*(y,t)\\ x t( Vt t)- We consider one step of the path-following 
scheme with exact full-step Newton iterations: 



(5.1) 



t+:=t- At, At > 0, 

y+ :=y- V 2 d(y, t+)~ 1 Vd(y, t+) =y- V 2 d(y, i+J^Vdfo, i H 



For sake of notation simplicity, we denote by A := Aj/. t \(j/)i Ai := A^. t \(y) and 
A + := Aj/ t \(y+)- It follows from (14. 15)) of Lemma B~2l that 



(5.2) A+ < 



A + A* 



1 - 2A* - A 



Now, we fix j3 £ (0, 1) such that A < /3. We need to find a condition on A such that 
A+ < (3. Indeed, since the right-hand side of (|5.2[) is nondecreasing with respect to A, 

it implies that A+ < U^ 2 ^_ p ) ■ Thus A + < fi if ^ 2 ^f_ p < Vfi whi ch leads to 
(5.3) 0<A*<A-=^ (1 -^-^ 



provided that 



(5.4) < (3 < f3* := 3 ^ « 0.381966. 
Since A = A*, according to (|4. 18[) . we can choose 

(5.5) At := at 



07+(^+l)A*' 

where a := ^/^T^^TKi ■ Therefore, t is updated by t + := t — At = (1 — cr)t. Note 
that t decreases linearly with the contraction factor (1 — a). 

In particular, if we choose [3 = w 0.095492 then A* w 0.113729, which leads 

V- 1 - yi7(A* + l)+A* ~ 1.1137^+0.1137- 

5.2. The algorithm and its convergence. Let us fix an initial value t = to > 
and [3 £ (0,/3*), where /3* is given in (|5.4p . First, we apply Phase 1 to find a starting 
point y° £ Y such that Ao := A^^. to \(y°) < (3- This phase is carried out by applying 
the damped Newton iteration scheme proposed in |17j . Then we perform the path- 
following algorithm. From Definition 13.81 we can see that if tk < z^W) then y k is a 
2£ C 2-solution of (|2 . 2|) . The algorithm is presented in detail as follows. 



Algorithm 3. (Path- following algorithm with exact Newton iterations) 

Initialization: Perform the following steps: 

1. Fix a constant /? £ (0,(3*) (e.g. /3 = \f3*), where /3* = « 0.381966. 

2. Compute A := ^z^zl an d a ~ * ., A . 
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3. Fix a tolerance £d > and choose an initial value t > 0. 
Phase 1. (Finding a starting point). 

1. Choose an arbitrary starting point y°'° G Y. 
For j = 0, 1, • ■ ■ do 

1. Solve exactly the primal subproblem (|3.2p in parallel to obtain x* (y ^ ,t ). 

2. Evaluate Vd(y 0j ,i ) and Vd(y°' j ,t ) by (|3~TU)) and (|3~TU)) . respectively. 

3. Compute the Newton decrement Xj = Aj^. to )(j/ 0j )- 

4. If Aj < /3 then set y° :— 2/ 0j and terminate. 

5. Update y°' j+1 as y 0j+1 := y 0j ~a 3 \7 2 d(y ^ ,t )- 1 Vd(y°^,to), where the step 
size a,- := — ~- € (0, 11. 

J l + Aj 

End of For. 

Phase 2. (Path- following iterations). 
For fc = 0, 1, • • ■ do 

1. If tk < E f m then terminate. 

2. Update t k as ifc+i := (1 — a)tk- 

3. Solve exactly the primal subproblem Q3.2[) m parallel to obtain a solution 
x*(2/ fc ,i fc+ i). 

4. Evaluate S7d(y k ,t k+1 ) and S7d(y k ,t k +i) by (|3.10[) and (|3.10p . respectively. 

5. Update y fc+1 as y k+1 := y k + Ay k = y k - X? 2 d(y k , t k+1 )~ 1 Vd(y k , t k+1 ). 
End of For. 



As in Algorithm [TJ the main task of this algorithm is Step 1 in Phase 1 and 
Step 3 in Phase 2, which can be carried out in parallel, and Step 5 in Phase 1 and 
Step 4 in Phase 2, which require a centralized computation to solve the linear system 
W 2 d(y k ,t k+ i)Ay = —Wd(y k ,tk+i) (see ScctionE}. In an implementation, the primal 
subproblem can not be solved exactly but it must be solved up to a very high accuracy. 

Since d(-,to) is standard self-concordant due to Lemma [3.41 By jTTl Theorem 
4.1.12], the number of iterations to obtain y° £ Y such that A^. to )(y°) < ft does not 
exceed 



(5.6) 



d(y°>°,t )-d*(t Q ) 
u(P) 



1 = 



d(y '°,t )-d*(t ) 
towGS) 



The number J max not only depends on the distance d(y 0,0 , to) — d* (to) but also on £o- If 
we choose to small then J max is large, while the number of iterations in Algorithm [3] is 
small. Therefore, in the implementation, we need to balance between these quantities 
to get a good performance. 

The convergence of Phase 2 in Algorithm [3] is stated in the following theorem. 

Theorem 5.1. Let t > and y° g Y such that Aj ( . to \(y°) < P- Then the 
maximum number of iterations k needed by Algorithm^ to obtain a led - solution y k 
of (|2.2[) does not exceed 



(5.7) 



k := 



In 



In 1 



V^(A* + l) 



where A* is defined by (15.31) . 
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Proof. From Step 2 of Algorithm^ we have t k = (l-a) k t Q = ( 1 + r f£, ' ) t . 



Algorithm|3]is terminated if tu < e f m ■ Thus (l + y=A . \ < ed (m , which leads 



to dSHj. □ 

Remark 7 (The worst-case complexity). Since In ^1 + ^(A'+i) J ~ yu(A*+i) 
i/ie worst-case complexity of Algorithm^ is 0{\/v\n.{to/ed)). 

Remark 8 (Damped Newton iteration). Note that, at Step 5 of Algorithm^ 
we can use a damped Newton iteration y k+1 := y k — ctkV 2 d(y k , tk + i)~ 1 X7d(y k , tfc+i) 
instead of the full-step Newton iteration, where = (1 + A^. t ) (y k )) _1 • ^ ^ s 

case ; nref/i the same argument as before, we can compute (3* = 0.5 and A* = 

6. Discussion on implementation. In this section, we first show how to han- 
dle a general concave objective function. Next, we discuss on solving the primal 
subproblem (|3.2[) including local equality constraints. Finally, we briefly describe a 
parallel method to compute the Newton-type direction for the master problem. 

6.1. Handling general objective function. If fa is nonlinear, concave and its 
epi-graph is endowed with a self-concordant barrier for some i G Im := {1, . . . , M}, 
then we propose to use slack variables to move the objective function into constraints. 
Let us denote by x% := (xf ,Si) T and 

Xi := {(xi, Si) | Xi e Xi, Si > s i; <j>i(xi) > s^ , 

for a sufficiently small value such that the constraint s, > is inactive. Let Fi 
be a self-concordant barrier of Xi and let c\ := (0 T , 1) T £ R™ i+1 . Then problem 
p.ip can be transformed into a convex separable optimization problem with linear 
objective function. In this case, the algorithms developed in the previous sections can 
be applied to solve the resulting problem. 

If fa is concave quadratic then, according to |16| Theorem 3.3.1], we can construct 
a self-concordant barrier Gi{xi) := — \a(fa(xi) — Si) for the epi-graph of fa. Particu- 
larly, the optimality condition for this problem is c + A T y — tWF(x) = 0, which can 
be written as 

f A T y - tVF(x) - Miaui /,•.,-,: - s^V/fr) = 0, 
| tdiag(fi(xi) - Si) -1 = 1. 

By substituting the second line into the first line of the above expression, we obtain 

A T y - tVF(x) + Vf(x) =0. 

However, this condition is indeed the optimality condition of the following problem 

(6.1) d(y,t):= max {f(x) + y T (Ax - b) - t[F(x) - F{x c )}\ . 

xGint(X) 

Consequently, the algorithms developed in the previous sections can be applied to 
solve (jl.ip without moving fa into the constraints. 

Several examples of convex problems for which the logarithmic function Gi(xi) 
is self-concordant can be found in [8]. Note that, in some problems, we may need to 
reformulate the epi-graph of fi to obtain a self-concordant barrier. For example, many 
optimization problems in network use an objective function of the form fa (xi ) = j^r- > 
where < x% < 1. The inequality presented the epi-graph of fa is 1 ^ i x < Sj, which 

is equivalent to \J (xi + Si) 2 + 4 < Xi — s, — 2. The last inequality is indeed a second 
order cone constraint endowed with a 2-sclf-concordant barrier |16j . 
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6.2. Solving the primal subproblems. Let us recall the primal subproblem in 
(16.11) with a nonlinear objective function. We need to solve this problem inexactly up 
to a desired accuracy e(t) > 0, e.g. e(t) = ( l/+2 Ju)(i+5) ■ Note that the approximate 
optimality condition of (|3.4j) becomes 



(6.2) ||V/(a:) + A T y - tWF(x)\\* xC < e(i). 

By separability, this approximate problem can be solved in parallel as 

(6.3) WVfiW+Ajy-tVFiixJWlc £<(*)> 0, i = l,...,M, 

where £j(t) = s(t). In principle, we can choose = However, in some 

practical situations, it is important to choose different £i(t) for different components, 
especially, when some component problems can be solved analytically in a closed form. 

Since Fi is standard self-concordant, the function ipi(xi] y, t) := Fi(xi)— t~ 1 (fi(xi)+ 
y T AiXi) is also standard self-concordant. Moreover, Vipi(xi; y, t) = VFi(xi)—t~ 1 (Vfi(xi)+ 
Afy) and V 2 tpi(xi;y,t) = V 2 F,(a; 4 ) - V 2 fi(xi). Since W 2 ip t (x t ;y,t) y 0, we define 



A^(xi) := [VV>i(:&»;j/,*)VV'»(a:i;2/,*) V^i^; y,t)] 



-1/2 



the Newton decrement of tpi . 

Now, let us apply Newton method to solve problem (|6.2[) . First, we fix fa £ (0, /?*), 
where (3* := |(3 — y/5), and choose x\ £ int(Xj). Then, we generate a sequence 

{x{}j>o as 

x\ +1 := x{ + OijAxl, 

(6.4) where 

Axj := -V 2 i; t (xi;y,t)- 1 Vi} t (x l ;y,t) and G (0,1]. 

Theoretically, the step-size ay can be chosen as ay := 1 if X^^xf) < Pi and ay := 
(1 + A^, i (a^)) _1 , otherwise. However, this choice is usually too conservative and not 
preferable in practice. Thus one can use an appropriate line-search procedure to select 
ay. Note that in linear programming, Fi is diagonal, e.g. Fi(xi) = diag(— \n(xi)), 
so that computing the Newton iteration (|6.4[) requires a low computational cost. In 
general, we have to solve a linear system of the form 

V 2 i) i {xj;y,t)Axi = -V^(xyy,t) 

to obtain a Newton direction Aa;^. The convergence of the Newton scheme (|6.4|) 
can be found in [17] . Note that in Algorithms [1] and [21 (|6.3j) is solved repeatedly at 
different tk- It is important to warm-start the Newton iteration (|6.4I) by using the 
finally approximate solution of the previous iterate tk—i as a starting point for the 
current one tk- 

Finally, if the local equality constraints EiXi = fi are available in (|1.1|) for some 
i £ {!,..., M}, then the KKT condition of the primal subproblem i becomes 



Cl + Afy + Ejzi - tVF^Xi) = 0, 
E i x l - fi = 0. 



(6.5) 

Instead of the full KKT system (|6.5|) . we consider a reduced KKT condition as follows 
(6.6) Zjici + Afy) - tZfWFiiZixl + i?~ T /. t ) = 0. 
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Here, (Qi,Ri) is a QR-factorization of Ej and [Y.^Zi] = Qi is a basis of the range 
space and the null space of Ef, respectively. Due to the invariance of the norm || ■ , 
we can show that \\xg— x*\\ x * = — ^* 2 ||x*-- Therefore, the condition (|4.ip coincides 
with \\x% — a;* z || x « < 5. However, the last condition is satisfied if 

(6.7) \\Zj( Cl + Afy) - tZjVFiiZixl + R~ T h)\\* xr < e h 

where YliLi £ i = £ p anc ^ £ p 1S defined by (|4.9[) . Note that the QR-factorization of Ej 
can be computed one time, a priori. 

6.3. Computing the inexact perturbed Newton direction. Let us rewrite 
the inexact-perturbed Newton direction in Algorithms [T] and [5] in a unified formula: 

Ay k ^-V'djG^t^VdjCt/*,*). 

where t can be tk+\ or t$. We discuss in this subsection how to compute Ay k in an 
appropriate way by taking into account the specific structure of problem (|1.1|) . Note 
that Ay k is the solution of the following linear system: 

(6.8) V 2 d- s (y k ,t)Ay k = -Vd E (y k ,t). 
The gradient vector X7dg(y k ,t) is computed as 

M 

Vdg(y k ,t) = Ax- S (y k ,t) - b = ^A % x l {y k ,t) - b := g k , 

i=l 

and the Hessian matrix V 2 d${y k ,t) is obtained from 

M M 

V%(y k ,t) = - t J2^ 2 F i (x i (y k ,t))-'Aj :=J2^G k A[. 

i=l i=l 

Note that each block AiXi(y k ,t) as well as A i \7 2 F i (xi(y k , t))~ 1 Aj can be computed 
in parallel. Then, the linear system (|6.8p can be written as 



(6.9) Ay k = -g k . 

Sine matrix G k y and Yli=i ^iG k Aj y 0, one can apply cither Cholesky-type 
factorizations or conjugate gradient (CG) methods to solve this problem. Note that 
the CG method only requires matrix-vector operations. More details on parallel 
solution of (|6.8[) can be found, e.g., in pT5l [29] . 



7. Numerical Tests. In this paper, we test the algorithms developed in the 
previous sections by solving a routing problem with congestion cost. This problem 
appears in the area of telecommunications and in other network flow problems such 
as transportation [9]. Let us consider a network Q = (Af,A), where AT is the set of 
nodes and A is the set of links. Let C be a set of commodities to be sent through the 
network Q . Each commodity k € C has a source Sk € M , a destination dk € M and 
a certain amount of demand dk > 0. Each link € A has a maximum capacity 
bij > in which no congestion is assumed to be appeared, and a linear cost per unit 
Cij. The variable Uijk denotes the amount of commodity k that is sent through the 
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link Flow exceeding may be sent through the link but will then causes 

congestion with an additional nonlinear cost function g t j depending on the exceeded 
value Vij considered as a variable. We denote by Af s and Md the sets of sources and 
destinations, respectively. Let Af c '■= Af\(Af s UAfd) and assume that each node in M c 
has at least one ingoing link and one outgoing link. 

Mathematically, the optimization model of the routing problem with congestion 
(RPC) can be formulated as, see, e.g. [9]: 

kec (i,j)eA (i,j)eA 

~d k if i G Afd, 
otherwise, 

Uijk - = bij, G A, 

kec 

u ijk > 0, > 0, G A, 

where wtj > is the weighting of the additional cost function <?.y for G A. 

In this example we assume that the additional cost function gij is given by one 
of the following functions: a) gij(vij) — — ln(vij), the logarithmic function or b) 
9ij( v ij) — v ij ^ n ( v ij)y the entropy function. With these choices, it was shown in |17j . 
the self-concordant barrier function corresponding to the epi-graph 

£ga '■= {( v ij, s ) G K+ x K | fify(^) < s] 

of gtj is given by: a) Fij (v^ , ) = — In v^ — ln(ln + Sy ) with parameter = 2 
or b) Fij(Vij, Sij) = — hi v^ — ln(sy — Vij lnuy) with parameter i/y = 2, respectively. 
Now, by using slack variables sy, we can move the nonlinear terms of the objective 
function to the constraints. The objective function of the resulting problem becomes 

(7.2) f(u,v,s) := c ij u ijk + ^ w i3 s, 3 , 

with additional constraints gij{vij) < Sy , (i, j) G A. 

It is clear that problem (|7.ip is separably convex with respect to M components, 
n variables Uijk, Vij and s-y and m coupling constraints, where M := n^, n := 
ncnA + 2 n A an d m := nenj^, where n_4 := \ A\, tiq := |C| and njv := |7V|. Let 

(7.3) Xjj := < % > 0, Mjjfc - Vij = bij , gij(vij)<Sij,(i,j) eA,keC\, G A 

I fcec J 

Then problem (|7.1[) can be reformulated in the form of with linear objective 

function (|7.2p and the local constraint set (|7.3p . Note that each primal subproblem 
of the form Q3.2[) has tiq + 2 variables and one equality constraint. 

The aim is to compare the effect of the parameters on the performance of the 
algorithms. We consider two variants of Algorithm [TJ where we set 5 = 0.56* and 
5 = 0.25<5* in Phase 1 and 6 = 0.01 and 6 = 0.005 in Phase 2, respectively. We 
denote these variants by Al-vl and Al-v2, respectively. For Algorithm 02 we also 
consider two cases. In the first case we set the tolerance of the primal subproblem 
to s p = 10~ 6 , and the second one is 10~ 10 , where we call them as A3-vl and A3-v2, 



(7.1) 



mm 

Uijk,Vij 



s.t. 
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respectively. All variants are terminated with the same tolerance = 10~ 4 . The 
initial barrier parameter value is set to to : — 0.25. 

The algorithms are implemented in C++ running on a PC Desktop Intel® Core(TM)2 
Quad CPU Q6600 with 2.4GHz and 3Gb RAM. The algorithms are paralellized by 
using OpenMP. The input data is generated randomly, where the nodes of the net- 
work are generated in a rectangle [0, 100] x [0, 300], the demand dk is in [50, 500], the 
weighting vector w is set to 10, the congestion vector is in [10, 100] and the linear cost 
Cij is the Euclidean length of the link £ A. The nonlinear cost function gij is 
chosen randomly between two functions in a) and b) defined above. 

We test the algorithms on a collection of 150 random problems, where 108 prob- 
lems are solve successfully. The size of these problems varies from M = 6 to 14.280 
components, n = 84 to 77.142 variables and m = 15 to 500 coupling constraints. 

The performance profiles are shown in Figures 17.11 and 17.21 The first figure shows 
the performance profile of 4 variants which consists of the total CPU time, the total 
time of solving the primal subproblcms in two phases, the CPU time of Phase 1 and 
the CPU time of Phase 2 separately in second. As we can see from this figure that 




Fig. 7.1. The CPU time performance profile of four variants. 



Algorithm [T] works better than Algorithm [3] in terms of the total CPU time and the 
CPU time for solving the primal subproblcms. Moreover, the accuracy in solving the 
primal subproblcms also affects the performance of the algorithms. We also observe 
that the number of iterations for solving the master problem in Phase 1 for all four 
variants are almost similar, while they arc different in Phase 2. However, Phase 2 is 
performed when the iteration point is in the quadratic convergence region, it only takes 
few steps toward the desired approximate solution. Therefore, the computational time 
of Phase 1 dominates the one in Phase 2. Moreover, in this example, the structure of 
the master problem is almost dense, we do not use any sparse linear algebra solver. 
Consequently, the algorithms developed in this paper are recommended to the class 
of problems with many variables and few coupling constraints in the case the master 
dual problem possesses dense structure. In other applications, the efficient methods 
for sparse linear algebra should be taken into account . 
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We also compare the total number of iterations for solving the primal sub-problems 
in Figure 17.21 It can be seen from this figure that Algorithm [T] is superior in terms 
of iterations compared to Algorithm [3J although the accuracy of solving the primal 
subproblem in Algorithm [3] is set to 10~ 6 , which is not too high in interior point 
methods. The performance profiles also reveal the effect of the parameters on the 



A1 -v1 

• A1 -v2 - 

A3-V1 

A3-V2 



0.7 



i (Total of number of iterations for solving the primal subproblems) 



Fig. 7.2. The iteration performance profile of four variants. 



number of iterations and computational time. Consequently, in practice, it is valuable 
to carefully choose appropriate parameters for a specific implementation. 

8. Concluding remarks. We have proposed a smoothing technique for La- 
grangian decomposition using self-concordant barriers in large-scale convex separable 
optimization. We provided global and local approximations to the dual function. 
Then, we proposed a path-following algorithm with inexact perturbed Newton iter- 
ations. The convergence of the algorithm has been analyzed and its complexity has 
been estimated. The theory presented in this paper is significant in practice, since it 
allows to solve the primal subproblem inexactly. Moreover, we allow one to balance 
between the accuracy of solving the primal subproblem and the convergence rate of the 
path-following algorithm. Even in the exact case, we also obtained a direct analysis 
for the convergence of the path-following algorithm which was presented by Mchrotra 
[P2] et al and Shida [22] . The details of implementation and numerical tests have also 
been presented. Extensions to the inexactness of linear algebra and to distributed 
implementation are an interesting and significant future research direction. 
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Appendix A. The proof of the technical statements. In this appendix, we 
provide a complete proof of Lemmas 13.11 13.21 and 13.31 

A.l. The proof of Lemma 13.11 Proof. Since Fi is standard self-concordant, 
according to [17l Theorem 4.1.7, inequality 4.1.8] we have 

Fi(yi) > Fi{ Xi ) + VF i {x i f{y i - Xi )+u(\\yi- x t \\ Xi ) 

> Fi(xi) - \\VFi(xi)\\l.\\yi - Xi\\ Xi + - XiWxt)- 

This inequality implies 

Fi( Xi ) - Fi( yi ) < ||VJi(a:OII*Jlw - a* Ik ~ w(lli/< - *ilk) 

< max {||VFj(xj)||*.||yi - Xi\\ Xi - io(\\yi - Xj|| Xi )} 

< max (HVFiCaOH^-wK)} 

i~\\yi—Xi\\ Xi >0 

= u*(\\VF i (x i )\\* Xi ). 

Here, the last equality follows from [17l Lemma 4.1.4] and the assumption that 
\Fi(x*(y,t)) < 1. Using the above inequality with j/j = x^ and = x*(y,t) we 
have 

(A.l) Fi(x*{y,t)) — Fi(x\) < u t (\ F M(v,t))- 

Now, we prove (|3.5[) . Let Xj(a) := x*(y,t) + a(x*(j/) - x*(y,t)) with a e [0,1). Since 
x*(y,t) € int(Xj) and a < 1, Xj(a) £ int(Jfj). By applying (T6l inequality 2.3.3], we 
have 

Fi(xi(a)) < F t (x*(y,t)) - v t hi(l - a), 

which is equivalent to 

(A.2) Fi(xi{a)) - Fiixi) < F t {x*{y,t)) - F^x?) - v t ln(l - a). 

Now, from the definition of di(y,i), the concavity of <j>i and di(y), and (|A. 1|) we have 
di(y,t)= max {<fo(xj) + y T A&i - t[Fi(xi) - i<i(x-)]} 

Xi£int(Xi) 

> max {<pi(xi(a)) + y T AiXi(a) - t[Fi(xi{a)) - i*i(x-)]} 

ae[0,l) 

> max fa[0i(aj(y)) + y T A,x,*(2/)] + (1 - a)[fa(x*(y,t)) + y T A lX *(y,t)] 

q£ 0,1) 

(A.3) 

-t 



[Fi( Xi (y,t)) - Fi{x1)\ + ^ln(l - a)} 

t |adi(j/) + (1 - a)di(y,t) - at[Fi(xi(y,t)) - F^x?)} + i/ 4 tln(l - a)} 

max i adi(y) + (1 - a)di(y,t) + tvi ln(l - a) - afu;*(AR(x*(?/,i))) [. 
qg[o,i) I J 

Rearranging (|A.3|) . we obtain 



max 

q£[o,: 

> 

*e[o,: 



(A.4) di (y, t) > d{ (y) - fa*;* (A n (x* (y, t))) + fe/< ^ °^ , Vae[0,l). 
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Since Mh^l < _i f or a u a G ( 0j 1) and lim a _> 0+ M^SiL = -1. Inequality (1X4)) 
implies that 

di(y,t)-di(y) > -t[uj t (X Fi (x^(y,t)))+Ui]. 



which is the right-hand side of (|3.5j) . The left-hand side of (|3.5[) follows from the 
relation F t {xi) - Fi(xf) > U)(\\xi - xf^c) > due to (pTtj) . □ 

A. 2. The proof of Lemma I3T21 Proof. The second inequality in ( |3 . 7[) is proved 
in Lemma l3.ll We now prove the third one. Let us denote by xl(y) :— x\ + r(x*(y) — 
xf), where r £ [0,1]. Since Fi is z/i-self-concordant, it follows from [THl inequality 
(2.3.3)] that 

Fi(xJ(y)) < FiixD - Ui ln(l - r), r e [0, 1). 
Combining this inequality and the concavity of we have 
di(y,t)= max {^(xj) + y T A i x l - t[F i (x l ) - F^a^)]} 

> max (♦,(.;(,)) + y T A,xT(v) - W(xUa)) - 
(A.5) f 

> max (l-r)[^(x l c ) + y T A l x^]+r[^(x*( 2 /)) + rtx*(2/)]+^iln(l-r) I 

re[0,l) J 

= max {(1 - T)d1(y) + rd^y) + tu t ln(l - r)} . 

re[0,l) 

Now, we maximize the function £(r) := (1 — r)d?(y) + rdi(y) + ii^ln(l — r) in last 



, where 

+ 



line of (|A.5[) with respect to r e [0,1) to obtain t* = 1 - jr^frjcnfj 

[a} + := max{0,a}. Therefore, if d,iv) ~f (v) < 1, i.e. t* = 0, then d t {y) - d^{y) < tv, 
Otherwise, by substituting r* into the last line of (|A.5[) . we obtain 



di(y) < di(y,t) + tvi ^1 + 

Summing up this inequality for i = 1,2 we get p.7[) . □ 

A. 3. The proof of Lemma 13.31 Proof. Let us fix k £ (0, 1), it is trivial that 
ln(x _1 ) < x~ K for < x < k 1 ^. Therefore, we have 

Vit(l + \ba.(Ki/(yit)]+) < vit(l + (Ki/(yit))- K ) < Wi + ViiKi/ui)"]?-", Vi < -^ 1/k - 

Consequently, if t < min {f-K 1/K , ( 2[ui+Vi e {Ki/ui) . ) mi ~ K) } then ^(l+[ln(Ay (^)] + ) < 
0.5e. Summing up this inequality for i = 1, 2, we get (|3.8p in Lemma 13.31 □ 
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